Lu Factorization Measurement and Optimization
نویسنده
چکیده
In this paper, we analyze the runtime performance of the LU factorization algorithm. To do this, we first test the basic algorithm, as well as a blocked algorithm across different compiler flags. Then, we add other specific optimizations to the blocked algorithm code in order to get the best runtime performance possible.
منابع مشابه
THE USE OF SEMI INHERITED LU FACTORIZATION OF MATRICES IN INTERPOLATION OF DATA
The polynomial interpolation in one dimensional space R is an important method to approximate the functions. The Lagrange and Newton methods are two well known types of interpolations. In this work, we describe the semi inherited interpolation for approximating the values of a function. In this case, the interpolation matrix has the semi inherited LU factorization.
متن کاملOn the WZ Factorization of the Real and Integer Matrices
The textit{QIF} (Quadrant Interlocking Factorization) method of Evans and Hatzopoulos solves linear equation systems using textit{WZ} factorization. The WZ factorization can be faster than the textit{LU} factorization because, it performs the simultaneous evaluation of two columns or two rows. Here, we present a method for computing the real and integer textit{WZ} and textit{ZW} factoriz...
متن کاملComputing a block incomplete LU preconditioner as the by-product of block left-looking A-biconjugation process
In this paper, we present a block version of incomplete LU preconditioner which is computed as the by-product of block A-biconjugation process. The pivot entries of this block preconditioner are one by one or two by two blocks. The L and U factors of this block preconditioner are computed separately. The block pivot selection of this preconditioner is inherited from one of the block versions of...
متن کاملLocality Optimization on a NUMA Architecture for Hybrid LU Factorization
We study the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear systems using an LU factorization algorithm. In particular we illustrate how an appropriate placement of the threads and memory on a NUMA architecture can improve the performance of the panel factorization and consequently accelerate the global LU factorization. We apply these placement strategies ...
متن کاملOptimization of an LU Factorization Routine Using Communication/Computation Overlap
This report presents some works on the LU factorization from the ScaLAPACK library. First, a complexity analysis is given. It allows to compute the optimal block size for the block scattered distribution used in ScaLAPACK. It also gives the communication phases that are interesting to overlap. Second, two optimizations based on computations/communications overlap are given with experimental res...
متن کامل